SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets

نویسندگان

چکیده

This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Unlike the well-known traditional algorithms, which assume that all data is memory-resident, our proposed method scalable and processes input chunk-by-chunk within confines of limited memory buffer. A temporary model built at first phase; then, it gradually updated by analyzing consecutive loads points. Subsequently, end clustering, approximate structure original clusters obtained. Finally, another scan entire dataset using suitable criterion, an outlying score assigned to each object called SDCOR (Scalable Density-based Clustering Outlierness Ratio). Evaluations on real-life synthetic datasets demonstrate has low linear time complexity more effective efficient compared best-known conventional methods, need load into memory; also, some fast distance-based can perform resident disk.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Local Density-Based Approach for Local Outlier Detection

This paper presents a simple but effective density-based outlier detection approach with the local kernel density estimation (KDE). A Relative Densitybased Outlier Score (RDOS) is introduced to measure the local outlierness of objects, in which the density distribution at the location of an object is estimated with a local KDE method based on extended nearest neighbors of the object. Instead of...

متن کامل

Local Kernel Density Ratio-Based Feature Selection for Outlier Detection

Selecting features is an important step of any machine learning task, though most of the focus has been to choose features relevant for classification and regression. In this work, we present a novel non-parametric evaluation criterion for filter-based feature selection which enhances outlier detection. Our proposed method seeks the subset of features that represents the inherent characteristic...

متن کامل

Scalable Varied Density Clustering Algorithm for Large Datasets

Finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Herein a new scalable clustering technique which addresses all these issues is proposed. In data mining, the purpose of data clustering is to identify useful patterns in the underlying dataset. Within the last several years, many clustering algorithms have been...

متن کامل

Discovering inappropriate billings with local density based outlier detection method

This paper presents an application of a local density based outlier detection method in compliance in the context of public health service management. Public health systems have consumed a significant portion of many governments’ expenditure. Thus, it is important to ensure the money is spent appropriately. In this research, we studied the potentials of applying an outlier detection method to m...

متن کامل

LODES: Local Density Meets Spectral Outlier Detection

_e problem of outlier detection has been widely studied in existing literature because of its numerous applications in fraud detection,medical diagnostics, fault detection, and intrusion detection. A large category of outlier analysis algorithms have been proposed, such as proximity-based methods and local density-basedmethods. _esemethods are effective in ûnding outliers distributed along line...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Knowledge Based Systems

سال: 2021

ISSN: ['1872-7409', '0950-7051']

DOI: https://doi.org/10.1016/j.knosys.2021.107256